Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Uyghur Text Automatic Segmentation Method Based on Inter-Word Association Degree Measuring
Turdi Tohti, Winira Musajan, Askar Hamdulla
Acta Scientiarum Naturalium Universitatis Pekinensis    2016, 52 (1): 155-164.   DOI: 10.13209/j.0479-8023.2016.023
Abstract1076)   HTML    PDF(pc) (836KB)(991)       Save

This paper puts forward a new idea and related algorithms for Uyghur segmentation. The word based Bi-gram and contextual information are derived from large scale raw corpus automatically, and according to the Uyghur word association rules, the liner combinations of mutual information, difference of t-test and dual adjacent entropy are taken as a new measurement to estimate the association strength between two adjacent Uyghur words. The weakly associated inter-word position is taken as a segmentation point and the perfect word strings both on its semantics and structural integrity, not just the words separated by spaces, is obtained. The experimental result on large-scale corpus shows that the proposed algorithm achieves 88.21% segmentation accuracy.

Related Articles | Metrics | Comments0